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Abstract. Let be a smooth compact oriented manifold without boundary, 
imbedded in a euclidean space E", and let 7 be a smooth map of © into a 
Riemannian manifold A. An unknown state 6 £ is observed via X = 9 + 
where e > is a small parameter and ^ is a white Gaussian noise. For a 
given smooth prior A on and smooth estimators g{X) of the map 7 we have 
derived a second-order asymptotic expansion for the related Bayesian risk [3]. 
In this paper, we apply this technique to a variety of examples. 

The second part examines the first-order conditions for equality-constrained 
regression problems. The geometric tools that are utilised in [3] are naturally 
applicable to these regression problems. 



1. Introduction 

In many estimation problems, one has a state which lies on a manifold but one 
observes this state plus some error in a euclidean space. It is desirable to utilise 
the underlying geometry to construct an estimator of the state. The present paper 
uses a Bayesian approach and the Bayesian estimator derived in and computes 
the estimator in a variety of examples. 

In many cases, the geometric framework of [3] naturally extends to regression 
problems. In an estimation problem, the map is known while the state is observed 
with noise and one attempts to infer the 'true' state; in a regression problem, the 
map is unknown and one observes the input-output states with some noise and 
attempts to infer the map. In this paper, we will assume that the regression map 
belongs to a given compact finite-dimensional manifold. In such a situation, one 
may formally transpose the regression problem in the sense that one may regard the 
map as the state that one observes with noise and the input-output states may be 
regarded as (evaluation) maps. This transposition is commonly used in topology 
and differential geometry. In the second part of this note, we derive first-order 
conditions for regression problems on manifolds. It is shown in several cases that 
this duality between estimation and regression is exact: the two viewpoints lead to 
the same estimator. 

Consider the following situation: E is a real s-dimensional vector space with 
inner product a and O (resp. A) is a smooth manifold with riemannian metric g 
(resp. h). Assume that the smooth riemannian manifold (6,g) is isometrically 
embedded in a euclidean space (E, a) via the inclusion map l, and e ^ A IS a 
smooth map. Smooth means infinitely differentiable. These data are summarized 



Date: August 18, 2009. 

2000 Mathematics Subject Classification. Primary 62C10; Secondary 62C20, 62F12, 53B20, 
53C17, 70G45. 

Key words and phrases. Bayesian problems, Bayes estimators, Minimax estimators, riemann- 
ian geometry, sub-riemannian geometry, sub-laplacian, harmonic maps. 

The first author thanks the Carnegie Trust for the Universities of Scotland for supporting this 
research. 



1 



2 



BUTLER, LEVIT 



by the diagram 

N(e) — ^ (E,a) 

a 

^(A,h), 

where N(0) is an open neighbourhood of in E and tt is the orthogonal projection 
onto 8. A basic result of differential geometry is that if Q is compact, then there 
is an r > such that tt is a smooth map on the set of all vectors within a distance 

r of e [13]. 

Suppose that X G E is a gaussian random variable with mean 6 E Q and 
covariance operatoi0 e^c, i.e. 

X r^N'{9,e^c), OeQ. 

A basic statistical problem is to determine an estimator "7(X)," by which we mean 
an optimal extension of 7 off Q, in the minimax sense. To make this precise, let 
: E ^ A be an estimator (map) , and let dist be the riemannian distance function 
of (A, h) . Define a loss function by 

where ipeiu) = exp(— |Mp/2e^)/(27re^)^, | • | is the norm on E induced by cr, and 
da: is the volume form on E induced by crH Define the associated minimax risk 

r,(e) = inf sup Re{g,e). 

1.1. Results: Bayesian estimation. One may use a Bayesian approach to de- 
termine the asymptotically minimax estimator g. Here one views 6 is viewed as a 
random variable with a prior distribution X{9)de where /g^g, X{e) dO = I {dO = dvg 
is the riemannian volume of {Q, g) ). The Bayesian risk of a map g is 

Re{9;X)^ [ I dist{g{x),^{e)fm4,,{x-L{e))dxde. 

A Bayes estimator : E — > A is a map which minimizes the Bayesian risk over all 
maps. 

Before stating the main result of [3] , recall that a riemannian connection permits 
one to define higher-order derivatives. In particular, Vd is used to denote the 
hessian (second derivative) and r = Tr (Vd) denotes the tension field (laplacian), 
while Ric denotes the Ricci curvature 13| . 

In [S], the present authors proved 

Theorem 1.1. Let ge{x) = exp^^^^-) (^e'^g2{x) +0{e^)^ be the Bayesian estimator 
for the Bayesian risk functional (Equation with a fixed Bayesian prior 

A > 0, where go, 92 o,re the lowest order terms in the expansion. Then for all e 
sufficiently small 



^By convention, the covariance operator is the induced inner product on the dual vector space 
E* . If we regard tr as a linear isomorphism of E — > E* , then the covariance operator is the inverse 
linear isomorphism c = : E* — > E. It is common to think of E as a space of column vectors, 
and the dual as a space of row vectors, in which case c is the transpose map x 1— ► x' from row to 
column vectors. 

"^One can introduce a cr-orthonormal coordinate system Xi on E. In this case, = y^.- 
and dx = dxi A ■ • • A dxs . 
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(1) for all X e N(9) with \x - ■k{x)\ < r 

'1 



£2 



2^(7)+d7(VlogA) 



tt{x) 



where T — jtt, and exp is the exponential map of (A, h); and 

(2) 

e^JdeX |i|Vdrp - |r(7) +d7(VlogA)p - ^(dr, Ricdr)| + ©(e^). 

In section [21 this notes applies Theorem 11.11 to compute the bayesian estimator 
and risk of the identity map for a wide class of compact group orbits and a 'linear' 
prior (see Theorem [ 



1.2. Results: Bayesian regression. Let 0, A be smooth manifolds. Let ^i, . . . , 0^ 

be a collection of design points on a manifold and let yi , . . . , be a random sam- 
ple of points on A. Assume that the conditional probability density of yi given 9i 
is fiyi\"t{di)), where 7 : ^ A is an unknown map and is a point on &. One is 
interested in estimating the unknown map 7 by minimising a discrepancy function 

1 ^ 

1=1 

for a given loss function ^ : A x A ^ [0, 00) (see section IX^ . If one assumes that 
the space of admissible maps 7 is parameterized by a finite-dimensional manifold 
r, a solution to this regression problem is 

7 = argmin {5(7) : 7 G F} . 



One may also consider the regression problem from a bayesian perspective. In 
this case, one assumes there is a smooth volume form d7 on F and a prior dis- 
tribution A(7) d7. The bayesian regression problem is to derive the regressor by 
minimising the risk functional 

TZ{j)^ f f ^(7,7)/(y|7)A(7)dyd7. 

over regressors 7 : A''" ^ F. 

In section O these two regression problems are examined. First-order conditions 
that determine the regressors are proven. In addition, we examine the special cases 
where O, A C E, F is a manifold of linear maps and 

(1) £ is determined by the ambient euclidean structure; 

(2) £ is determined by the riemannian distance function on A induced by the 
euclidean structure. 

The special case where 8 and A are both the 2-dimensional unit sphere C E'^ and 
F is the group of orientation-preserving linear isometrics of E^, SO (3) is examined 
in detail in each case. 

The results of section |3] are formulated in Propositions 13.11 13.71 and 13.111 

2. Estimation of states on group orbits 

Let (E*',cr) be an s-dimensional euclidean space: that is, E'* is an s-dimensional 
real vector space and ct is a symmetric, positive-definite quadratic form on E''. The 
group of linear isometries of E* is denoted by 0(E'*, cr) and called the orthogonal 



4 



BUTLER, LEVIT 



group of (E*,ct). This group is denoted by 0(E) when the euclidean structure a 
and dimension s are understood. By choice of an orthonormal basis, (E'' , cr) is 
Unearly isometric to with its standard orthonormal basis; the orthogonal group 
of this latter model euclidean space is denoted by Og, while SO^ is the subgroup of 
Og with unit determinant. 

The set of linear transformations E* W is denoted by Hom(E*,E''). It is 
naturally a euclidean space with the trace inner product {x^y) i— > Tr(a;'j/). There 
is an orthogonal decomposition of Hom(E^,E*) into the sets of skew-symmetric 
transformations (denoted sOs) and symmetric transformations (denoted sym(E*)). 

Let G C 0(E*, a) be a compact group of linear isometrics of (E**, a). A tangent 
vector ^ G TiG in the tangent space to the identity of G can be identified with a 
matrix. The matrix exponential map restricts naturally to give a map exp : TiG — > 
G. For each g G G, the curve t — > exp(i^) ■ g is a curve in G passing through g at 
t — 0. Its derivative £, ■ g is therefore a tangent vector in TgG. Thus, each tangent 
space is canonically isomorphic to TiG via right translations One typically writes 
TiG = g, and calls g the Lie algebra of the Lie group G. As a set of matrices, g 
is equipped with the Lie bracket denoted by , 77] — ^ ■ V ~ V ' ^- One can easily 
verify that ^,rj E g implies that [^,77] G g. In addition, for each 5 £ G, ^ G g, the 
element g ■ S, ■ g^^ G Q. It is conventional to write Adg^ = 9 ' £, ' and observe 
that Ad : G — *■ GL(g) is a representation, called the adjoint representation. One 
knows that |j^q Adoxp(t^)?/ = [Ci so the derivative diAd =: ad : g ^ sKs) is a 
linear representation of g. 

The trace form (^,77) ^ Tr (^'77) is positive definite on g. Moreover, the trace 
form is invariant under the adjoint representation of G, i.e. G acts as a group of 
isometrics of this euclidean structure on g. For a subspace V C Q, let denote 
its orthogonal complement with respect to the trace form. 

For each i9 G E, let the set G^ = G E : 3g e G and (j) = g ■ ■&} he the G-orbit 
of ■& and let G§ = {(7 G G : g ■ d = he the G-stabilizer of -d. It is a well-known 
theorem that G-'d is a smooth submanifold of E. The tangent space to G-S at (j) 
can be identified with g^, where g^ C g is the Lie algebra of G^. Indeed, since G 
acts transitively, the map g — > T0(G-i9) : ^ 1-^ ^ • is onto and its kernel is g^. If 
(f> = g ■ then one sees that G^ = g ■ G^ '9^^ and similarly for the Lie algebras. 

The normal bundle N(G-7?) of G-i? is isomorphic to the vector bundle 

N(G-7?) = G xg, (T^G-i?)^ = G xg, N^(G-i^). 

Here, G x [G-d] is the cartesian product of the group G with the orthogonal com- 
plement N^(Gi?) to the tangent space to G's orbit through -d. The stabiliser G^ acts 
linearly on Nij (Gi?) and by right translation on G. The set Gxq^ N,? (Go?) is the quo- 
tient space whose points are the sets (Gij-orbits) [g, v] = {{g ■ h,h ■ v) : /i G G^} 
for each {g,v) G G x ^i,(G-d). 

It is also a well-known fact that there is an open neighbourhood of G-i? which is 
G-equivariantly diffeomorphic to an open neighbourhood T of G-{) in N(G-i?). See 
for generalities and [71 [U] for specifics on linear Lie groups. 

To simplify notation, 8 is used to denote G-d in some cases. 

2.1. The projection map onto G-d. Let us now derive the projection map 
TT : T ^ G-i?. Given a; G E, assume that there is a 5 G G such that 

g-^x G I? -I- ^i){G-d) which implies x (z g-d + Ngi,(G-i9), (1) 



'One can equally use loft translations. 




Figure 1. The tubular neighbourhood T and the normal bundle N(8). 

by G-equi variance. In this case, we can define 

^{x) = gd. (2) 

Lemma 2.1. There is an open neighbourhood T o/ O = G-i? such that the map tt 
defined in ^ is independent of'dinQ. In addition, n is a real- analytic submersion 
whose fibres are open neighbourhoods of G N0(O) for each (j) G Q. 

Proof. It suffices to observe that if a; e N^{G-d), then one can take g — 1 mod 
in ll]) and 7r{x) — d, and that HHH) defines tt as a G-equivariant map from N(G-'(?) 
to G-'d. The lemma then follows from the tubular neighbourhood theorem jl3| . □ 

Remark 2.2. 1/ In general, the affine planes -d + N^(0) and (j> + N^{Q) intersect 
each other, as in figure [T] At such a point of intersection tt is not single-valued; 
hence these points obstruct the extension of tt from a tubular neighbourhood of 
to a globally-defined map on E. 2/ Lemma [2.11 is a consequence of the tubular 
neighbourhood theorem for group orbits. Moreover, many linear-algebraic decom- 
positions are, in fact, an application of this tubular neighbourhood result. 

2.2. Linear priors on G-d. As noted in the introduction, the euclidean structure 
a induces a linear isomorphism E ^ E* : v i-^ v{*) = a(v, •)@ For each w G E, let 
■D e E* be the dual vector induced by the euclidean structure a and let fv = ?)|G-'(? 
be the restriction of v to the group orbit. In terms of the inclusion map l. : G-d — > E, 
one can write f^ — v o l. Let be the minimum value of /„ and fv — J fvi'f) d0 
be the mean value of /„ with respect to di/), the unique G-invariant probability 
measure on G-d. (One can define (/> = J i{4>) dff) to be the mean element of G-i?, 
in which case /„ — (v, (p). Since is a fixed point of G, = unless E contains a 
trivial representation of G.) Define a bayesian prior density A = At, by 

A. - a/, + /? (3) 

where the real numbers a > and (3 satisfy afy +13—1 and af ^ -\- (3 — c > Q. 
The chain rule shows that Afy — du o dt, whence 

V/„(0) =d^^(«) 

\/\og\y{(j)) = T- X d07r(w) for aU (f> G G-d. 

At, 

The gradient vanishes at iff w e N0(G-i?). The chain rule for second derivatives 
shows that Vd/t, — Vdw(di, di) + Av o Vdt = dw o Vdt since Vdz) = because i) is 
linear. The tensor field Vdt is the second fundamental form of G-i9 in E and it is 
a measure of the curvature of G-d. Application of the definition of Vdt 6 shows 



One often thinks of this map as v v' , v maps to ?j-transpose. This notation is also used 
below. 
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that Vdt(f^,77^) = (1 - d^7r)C ■T]-(j) for aU ^ &■ Thus Vd/^(C,C)|^ = -|^ • 
for all ^ S ~ T^G-'d, so the maximum is a non-degenerate critical point of f^. A 
well-known theorem in Morse theory states that for almost all w G E, is a Morse 
function onG-'d [T3^. 

For the purposes of imposing a strong prior, a natural choice is v — d. The 
Cauchy-Schwarz inequality plus the fact that G acts by isometrics implies that 
(hence A^) attains its unique maximum value at 

Let Ag be the Haar probability measure on G. The Haar measure factors as 
d/i • <M, where dh is the Haar probability measure on H , the stabiliser of i9, and <M 
is the unique G-invariant probability measure on Q = G-i!). Define 



dg logXg.y{i}) g ■ v 



geG 



to be the mean of v over G-v taken with respect to a peculiar measure. 



Theorem 2.3. Let — G-i? and 7 be the identity map ofQ. Let the bayesian prior 
density A = A„ be defined by ([3]). Let a; e T, = tt{x) and ^ G gj^ be the unique 

vector such that ^ ■ "d = d^7r(w). The bayesian estimator g^ and its risk equal 



ff,(x) =exp(s^ + 0(e*)).z? 



i?e (^e ; At, ) = dim Q + 



^scale,A + (w^,r(0^)j +0(e'5) 



(4) 



(5) 



where exp is the exponential map of the Lie group G, scale. a is the average of the 
scalar curvature of Q with respect to d-dX and t{l) is the normal vector field ofO. 

The proof of ^ applies Theorem 11.11 and the fact that G acts as a transitive 
isometry group of 8. It should be noted that, although is not independent of d, 
the inner product {v^,t{l)^) is independent. In addition, the integration-by-parts 
formula is needed to demonstrate 

In the particular case of u = 0, A is a flat prior density, the G-invariant measure 
d9 is the flat distribution and the estimator is geix) = 'd+0{e'^) with risk R^ig^] 1) = 
e^dimO -I- e^'scale/S -I- O(e^) where scale is the mean scalar curvature of 8. The 
flat prior produces the minimax estimator in this case. 



2.2.1. A sample application: S"^ . Let us apply theorem 12.31 to the case where E = 
E^, is the 2-dimensional unit sphere in E'^ and G = SO (3) is the group of 
linear, orientation-preserving isometries of E'^. In this case, the projection map 

is 7r(a;) = x/\x\ and the projection of v onto T^Q is 
/ the orthogonal projection v = d^7r(u) — v — {v^d) "d. 

,' The bayesian estimator in this case is 

where v G so(3) is the rotation by 7r/2 radians coun- 
terclockwise in the plane orthogonal to v. 

If one supposes that w G 5^, then (3—1 and < 
a < 1. Since t{l)^ ~ iS, one computes that 




{vv,t{l)^) = (a ^ - l)log 



1 



1 



1 

a 



(6) 



On the other hand, the scalar curvature of the 2-dimcnsional unit sphere is twice 
the Gaussian curvature, hence is 2, and the mean of A is 1, so scale a = 2. The 
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bayesian risk is therefore 

R{g,; A,) = 2f^ + Q + {v,,t{l),)^ + 0(6^). 

Inspection of ^ shows that the right-hand side is a at a = 0, 1 and it is monotone 
increasing on [0, 1]. This verifies that the flat prior (a — 0) yields the second-order 
minimax estimator. 

2.3. Derivation and application of the projection map. This section applies 
Lemma [2. II to a wide range of orbit spaces. Lemma [2.11 savs that to construct the 
bayesian estimator of theorem 11.11 it is necessary to give a concrete description of 
a tubular neighbourhood T and the projection map tt : T — * from the tubular 
neighbourhood to the group orbit 9. 

It is also necessary to give a concrete description of the exponential map of the 
riemannian manifold (6,g) C (E^jtr). This problem is solved as in Theorem 12. 3[ 
where one uses the linear isomorphism between T^O and g^, which pulls back the 
riemannian exponential map to the Lie group's exponential map. 

[31 Chapters 1-3] provide a nice background, aimed at statisticians, for many 
applications of several of the orbit spaces considered below. 

2.3.1. The sphere S"'-'^. Let G = 0(E"), and d E E" be non-zero. The group orbit 
G'-d is the sphere of radius r — \{}\. Without loss of generality, one can suppose 
that r = 1 and E" has a basis where ei = d. In this case, N^(G'-z9) = Ri? and 
g~^x € t9 -I- N^(G-i?) iff g~^x = Aei iff a; = Xgei and A = ±|a;|. Because n must be 
the identity on G-t?, one sees that A > and therefore Vx ^ 

tt{x) = gd ~ x/\x\. 

In this case T = E - {0}. 

2.3.2. The Stiefel manifold. There is a natural generalisation of the unit sphere 
introduced by Steifel [31 [H]- Let v = [vi ■ ■ ■ Vk] be a fc-tuple of unit vectors Vi G E" 
for k < n which are mutually orthogonal. The set of all such orthonormal fc-frames 
V is called a Stiefel manifold and denoted by 14 (E"). One can naturally identify 
Vfe(E") as a subset of E = Hom(E'^,E") (the n x k real matrices). The euclidean 
structure cr =(•,•) on E is defined by 

{x,y)^Trix'y) (7) 

for all x,y E E where x' is the transpose of x. The group G = 0(E") acts on E by 
left multiplication and with the frame d = [ei ■ ■ ■ Ck] 

Given a; G E, the map 

k{x) = xx k: Hom(E'=, E") ^ sym(E'=), 

from the k x k matrices to the symmetric k x k matrices, defines a submersion 
when X is of maximal rank k and G acts transitively on the fibre of k. Thus, if 
a; £ E is of maximal rank, then the normal space x + l>lx{G-x) can be identified 
with x'x + sym(E'^) via the linearized map dxH- 

To compute the projection map tt : T ^ Vfe(E"): let T be the connected com- 
ponent containing i? of the set of a; G E of maximal rank. For each x E T, x'x 
is a symmetric, positive-definite matrix and therefore a;'a; has a unique symmetric 
positive-definite square root r =: (x'x)^. Let us define 

7r(a;) = a;(a;'a;)"3 7r:T^X4(E"). (8) 



8 



BUTLER, LEVIT 



It is clear that tt is a G-equivariant map, 7r(i9) = ■& and since k o tt maps T to 
1 g sym(E'^), the image is Vfc(E") and 7r|y^(E") = ^d. These facts suffice to show 
that the map tt is indeed the projection map of the normal bundle. (If one had 
taken another square root of x'x to define tt, then 7r('i9) ^ d, so that map could not 
be the projection map of a tubular neighbourhood). 

In the general case, let t9 g E be of maximal rank and let t be the unique 
positive-definite symmetric square root of O'l^. The projection map tt : T — > G-t9 is 
then 

Tr{x) — x{x'x)^^T TT : T — > G-i?, 

where T is the set of maximal rank elements in E. 
One can specialize the above construction to obtain: 

fc = 1 : In this case, Vi(E") = S*"^^ and {x'x)^ — so (O specializes to yield 
the projection map onto 5"^^; 

k = 2 : In this case, V2(E") is the unit sphere bundle of 5*"^^, so ([8|) specializes to 
yield the projection map from the set of non-coUinear vectors in E" x E" 
to the unit sphere bundle S{S"~^); 

k = n : In this case, V^(E") = ©(E") and p — {x'x)^ is the polar factor in the 
polar decomposition x = gp where g € 0(E") and p E sym(E"). Thus ([S]) 
specializes to yield the projection map of an arbitary invertible matrix onto 
its orthogonal part. See example l2.3.9l b elow for a general construction. 



2.3.3. The real Grassmannian manifold. Another group orbit space that is closely 
related to the Steifel manifold is the manifold of unoriented fc-dimensional planes in 
E", called the Grassmannian manifold IHl^. Let Gfc(E") denote the Grassmannian 
manifold of unoriented fc-planes in E". A fc-plane 11 in E" is uniquely character- 
ized by an orthogonal projection pn € Hom(E",E") which is symmetric, has an 
image equal to 11 and kernel equal to 11^. Since each plane 11 and its orthogonal 
complement admit an orthonormal basis, we have the following natural description 
of the Grassmannian manifold as an orbit space {k + I = n) 



Gfe(E") - G-^ 



G = 0(E"), i? = 



Ife 




where the action of G on the symmetric matrices sym(E") C Hom(E",E") is by 
conjugation/congruence 

g ■ x = gxg' \/g £ G,x e sym(E"). 

The Grassmannian manifold is equivariantly diffeomorphic to 

Gfc(E") 0(E")/0(E'=) X O(E'). 
To identify the normal space N^(Gfc(E")), a computation shows that 



T^Gfe(E") — 
whence the normal space is 
N4Gfe(E")) = 



Ok 
a' 



'(3 0' 
7 



a e Hom(E',E''' 



/3esym(E'=),7esym(E') 



Recall that every x G sym(E") is congruent via some g G 0(E") to a diagonal 
matrix A = diag(Ai, . . . , A„) where the eigenvalues satisfy Ai > A2 > • • • > A„. 
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One may then define, for all those x with Afe > Afe+i, the projection of x onto the 
Grassmannian manifold via 

X = gXg' =^ tt{x) = g-dg' . 

Equivalently, x is congruent via a 51 G 0(E") to a matrix y in N^(G'fc(E" )) where 
the eigenvalues of /3 dominate those of 7. In this case, one can define 7r(a;) = gi'&g'i- 
The two definitions of tt coincide since g = gimod the stabilizer of ^. In this case, 
the tubular neighbourhood T is the connected component containing of the set 
of X G sym(E") which have eigenvalues such that Afe > Afe+i. 

Remark 2.4. In the above construction, one may replace (E", sym(E"), 0(E")) 
and the real transpose by (C", sym(C"), U„ C 0(C")) and the conjugate transpose 
(resp. (H"',sym(H"), Sp(H") C 0(H")) and the quaternionic conjugate transpose) 
to obtain the grassmannian of complex fc-planes in C" (resp. the grassmannian of 
quaternionic fc-planes in quaternionic n-space H"). In these cases, one views C" 
(resp. H") as a real euclidean vector space, where the euclidean structure is pro- 
vided by the real part of the hermitian (resp. quaternionic) structure, and the 
isometrics preserve both the euclidean structure and the complex (resp. quater- 
nionic) structure. The construction of the projection map of the tubular neigh- 
bourhood is essentially the same. Since the conclusions of Theorem 11.11 rely only 
on the real euclidean structure, the conclusions remain valid. 

2.3.4. The singular-value decomposition. Let E = IIom(E'^,E") (the n x k real 
matrices) with the euclidean structure defined as in 12.3.21 and let G = 0(E") x 
0(E'=) act on E by 

g-x^gixg^^ Vg = (51,32) G G,s G E. (9) 

Without loss of generality, one may assume that k > n. In this case, the well-known 
singular- value decomposition says that there is a 5 G G such that g~^ ■ x = d where 
•d is in the "diagonal" form 





{>2 







and i9i > > ■ • • > > 0. 



(10) 



To compute the normal space oi G-'d is somewhat involved, but one can simplify 
the computation in the following way. 

A non-degenerate symmetric bilinear form 77 is of index (n, fc) if it is positive 
definite (resp. negative definite) on a subspace of dimension n (resp. fc). Let E"''^ 
be a real vector space with indefinite inner product 77 of index (n, fc) [14] . The 
orthogonal group, H = ©(E"'*"'), of this pseudo-cuclidean space is non-compact but 
its maximal compact subgroup is G = 0(E") x 0(E'^)11 The Lie algebra, f) — On,k, 
of ©(E"'*^) contains the subalgebra = o„ © o^ and its orthogonal complement 
relative to the trace form, the subspace 



X 



The action by conjugation of G = 
the action defined in 

For i9 G p, the orbit of G has 

T^{G-^) = adgi? 



X G Hom(E'',E" 
0(E") X 0(E''') on p is naturally identified with 



N^(G-i9) = pn(adBi?)^ =p^ 



(11) 



'^In the special case of fc = 1, 
relativity | 14| . 



has Lorentzian geometry, which is important in special 
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where is the intersection of the centrahzer of i? in [) with p. If one supposes that 

[On 1 

, and is in the diagonal form of 

p UfeJ 

then contains all elements of the same form as i?. Therefore, for 

~ [x' Ok_ 

one knows from the singular-value decomposition of x, that x is conjugate via a 
5 G G to an element in p^. Thus, one has that 

7r(x) = gi9g^^ where g = (51,32) and gi^xg2 is diagonal 

TT : T ^ G-i? 

or, equivalently 

tt{x) = n : T ^ G-^ 

where T is the set of x whose singular values have collisions nowhere except possibly 
where those of i9 collide and T is defined similarly. 

Remark 2.5. From an applied point of view, one may wish to approximate the 
matrix x £ Hom(E'^, E") by a low rank matrix. To do this, one could specify •& in 
pHI with I?, = 1 for i = 1, . . . , ; and -d^ = for i > I. One can then use ^XM^ZM 
to project X onto the low-rank matrix orbit G-'d. In applied mathematics, one 
approximates a; by a low rank matrix in a slightly different manner. One computes 
the singular- value decomposition A = g^^xg2 with A in the diagonal form ()10p . one 
truncates A to a diagonal matrix Aq by zeroing out the singular values Xi+i, . . . , A„ 
and then one defines xq = ffiAof/^^ to be the low-rank approximation (in practice, 
I <^ k so one saves only the / right and left singular vectors not 171 and 52)- In 
this case, one knows that the set of rank I matrices is the union over all rank I 
orbits, and one uses x to determine the particular orbit onto which x is projected. 
By construction, this determines the rank I orbit that is closest to x. 

2.3.5. The lagrangian grassmannian. A totally real subspace 11 of C" is a real 
subspace which has the distinguished property that 11 n ill = 0; a totally real n- 
plane in C" is also called a lagrangian plane. Let A„ — U„/0„ be the manifold of 
lagrangian planes in C". This manifold arose in Maslov's work on quantisation pT| . 
It is well-known in hamiltonian mechanics that the stable and unstable subspaces 
of a hyperbolic linear hamiltonian system are both lagrangian planes T . 

One can embed A„ into E — sym(C"), the subspace of complex n x n matrices 
which are symmetric under the transpose (not conjugate transpose). The euclidean 
structure a = (■, •) on E is defined by 

{x, y) = Tr {x*y) Vx, y G sym(C"), 

where x* is the conjugate transpose of x. The unitary group G — U„ acts by 
isometrics of (E, a) by 

g-x=gxg' \/geVn,xe'E. (12) 

The stabilizer of i9 = 1 G E under this action is the real orthogonal subgroup 0„, 
so A„ = G-i?. 

To compute the projection map, one observes that T^(G-i?) = i ■ sym(R") and 
N^(G-i?) = sym(R"). To define the projection of a; G E onto G-'&, it is necessary 
that there exists a unitary g G U„ such that g^^ ■ x G d + 'N^{G-'d) = sym(R"). 
Thus, 

3g G U„,p G sym(R") such that x = gpg' =^ Tr{x) = gg' . (13) 
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To see the connection with the above description of A„, note that ii x G syin(C") 
admits a factorization as in ()13p . then the symmetric quadratic from qxiu, v) = u'xv 
is totally real on the real subspace w ■ R" C C" where w = {g')^^', and conversely, 
if Qx is totally real on w ■ R", then x admits a factorization as in (I13p . Provided 
that X is non-degenerate, w G U„ and hence g, is uniquely defined up to an element 
in On- 

Remark 2.6. As in the singular-value decomposition, there is a natural Cartan 
decomposition that is associated with this example [7]. Let H = Sp(R^") be the 
group of symplectic automorphisms of R^" = C" where the symplectic form is the 
skew-symmetric bilinear form 

The Lie algebra \) oi H admits a Cartan decomposition t) = t + p where 



a 
-P a 



,(3 = (3' 



a b 

b —a 



a = a. 



b' 



where all matrices are real. The maps (a, (3) a + i(3 and (a, b) ^ a + ib shows that 
8 ~ u„ and p c± sym(C"). The action of ii' ~ U„ on p by conjugation is identified 
with the action on sym(C") by congruences ((HI). We note that, by the theory of 
Cartan subalgebras, an x = (a, b) € p may be diagonalized over R on a basis that 
is simultaneously symplectic and orthogonal, that is, x G p is conjugate to a real 
diagonal matrix via a unitary transformation [7|. This implies the validity of the 
decomposition (fTS]) on the set of x G p [x € sym(C")) which are non-singular. 

To compute the projection map tt : T — > G-d when ?? is in general position, it is 
most easy to apply (fTTj) . 

Remark 2.7. In the above construction, one may replace (E", C", E = sym(C"), U„ 
0(C")) and the real transpose by (C",H",E = sym(H"), Sp(H") = 0(H")) and 
the complex conjugate transpose. The orbit of i9 = 1 is the homogeneous space 
Sp(H")/U(C"), which is the grassmannian of totally complex n-planes in H". 

2.3.6. The isotropic grassmannians. There are several distingished orbits in sym(C") 
in addition to the lagrangian grassmannian. From the natural embedding R'^ C 
R" c C", one obtains the grassmannian manifold of isotropic (or totally real) 
/c-planes in C" 

Afe,„ = U„-R'= = U„/Ofe xU, (k + l^n). 



If one defines 

\lk 



i9 



0/ 



Afc,„ = U„-i9csym(C") 



The grassmannian of totally real fc-planes in C" arise naturally in hamiltonian 
mechanics. For example, the tangent spaces to an orbit of the Keplerian 2-body 
problem trace a closed curve in Afc_„. 

To compute the tubular neighbourhood of Afc^„ and its projection map, remark 
()2.6p implies that each x G sym(C") is congruent via a g G U„ to a real diagonal 
A = diag(Ai, . . . , A„) with Ai > A2 > • • • > A„. If Afc > Afe+i, then the first k largest 
eigenvalues and consequently the sum of the eigenspaces is uniquely determined. 
One can then define tt(x) = gi3g' , which is well-defined. 

Geometrically, this condition amounts to the following. Remark (|2.6p implies 
that there is some lagrangian plane £ on which is totally real. If the condition 
on the eigenvalues of x is satisfied, then there is a unique isotropic fc-plane C £ 
such that £ = £k®£^. The la grangian £ and £^ are not uniquely defined (if is 
degenerate), but £k itself is. The map x >—>■ £k is the projection map of the tubular 
neighbourhood. 
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2.3.7. The manifold of orthogonal complex structures on R^". A complex structure 
J on R^" is a linear map such that = — 1; it is orthogonal if J' = J"'^. A com- 
plex structure has eigenvalues ± i (repeated n times) , so it is necessarily orientation 
preserving. The conjugate of an orthogonal complex structure is also an orthogo- 
nal complex structure, and conversely, any complex structure is conjugate to the 
standard complex structure by an orthogonal conjugacy. 

Let us construct the manifold of orthogonal complex structures as a homogeneous 
space. Embed U(C") into G = S0(E2") by 



a + il3 I 



a (3 
-P a 



Vx e Hom(C",C") 



where a (resp. /?) is the real (resp. imaginary) part of x. Let SO(E ) act on its 
Lie algebra E 
the element 



S02n by conjugation. The standard complex structure on R is 



1? 



which has the orbit G-i? = S02n/U,i 



since U„ is the stabilizer group of d. 

The tangent and normal space to G-i? at {} are equal to 



Ttf(G-^?) = adpi?, N4G-i?)=u„, 



P = 



S 7 

7 —S 



(5, 7 e so„ 



where p is the orthogonal complement of u„ in so2n- It follows from the fact that 
every x G so2„ is contained in a Cartan subalgebra that 



3g g S02n such that g'xg 





ai 








an 








0„ 


-an 





a, 



and the stabilizer of a is contained in U„ provided that detcc ^ 0. Thus, the 
tubular neighbourhood T of G-z9 is the connected component containing d of the 
set of a; G S02n such that detx ^ 0. The projection map Tr{x) — gdg' is therefore 
defined on T. 

Remark 2.8. One can also define the manifold of unitary quaternionic structures 
J on C^". In this case, the homogeneous space is SU(C^")/Sp(H") and the con- 
struction is essentially the same as above. 

2.3.8. Adjoint orbits. Let G d H he compact Lie groups and let g C f) be their 
Lie algebras. The negative Cartan-Killing form on t), {x,y) ^ — Tr (ad^, -ad^), 
defines a G-invariant euclidean structure, where G acts on t) by the adjoint action 
(conjugation) [7|. Let G f) and let G^ be the stabilizer of in G, and be its 
Lie algebra. One has 



and 



Ti,(G-7?) = adgt?. 



G-i) = G/Gi, 



N^(G-7?) = ad-i (fl-L) 



One knows that there is an equivariant tubular neighbourhood T of G-d such that 
the projection map vr : T — *■ G-i? is defined. The examples above may be formulated 
in these terms. 



BAYESIAN ESTIMATION OF MAPS 



13 



2.3.9. The group G itself. If G C 0(E"), then G C Hom(E",E") = E. With the 
euchdean structure on E defined as in ([7]), one obtains a decomposition 

E = B + p = TiG + Ni(G). 

The projection map of the tubular neighbourhood T can be defined as: if, given 
a; G E, there is a unique g G G such that g^^x e p, then n{x) g. We see that 
the projection is a generahzation of the polar decomposition encountered above in 
the k = n case of example 12.3.21 

3. Regression problems 

This section deals primarily with the first-order conditions for a class of non- 
linear regression problems. Despite the fact that section [2] showed the construction 
of second-order minimax estimators, the geometry that underlay those construc- 
tions is very similar to that required here. We also give a numerical example of the 
derived regressor in the specific setting. 

Let 01, ... ,9k be a collection of design points on a manifold O and let j/i , . . . , j/fe 
be a random sample of points on a manifold A embedded in a euclidean space as in 
diagram Let the conditional probability density of y given 9 G 8 be f{y\'^{9)), 
where 7 is an unknown map. 

(E,a) (14) 




(e,g) — -(A,h), 

One is interested in estimating the unknown map 7 by, say, minimizing the discrep- 
ancy function 

k 

1=1 

The loss function ^ : A x A ^ R is assumed to satisfy 

(1) £ is continuous everywhere and smooth a.e.; 

(2) £{x, y) = £{y, x) for all x,y e A; 

(3) £{x, y) > for all x,y G A and equals iff a; = y; 

(4) the hessian Vd^jj^j.^^^ is non-degenerate, where AA = {{x,x) : a: G A} is 
the diagonal. 

Natural examples of loss functions include: 1) that induced by the euclidean struc- 
ture, £{x,y) — \]x — jyp for all x,y G A; and 2) that induced by the intrinsic 
riemannian distance on A, £{x,y) — dist(a;, ,yY for all x,y € A. 

If one assumes that the space of admissible maps 7 is parameterized by a compact 
finite-dimensional manifold F, a solution to this estimation problem is 

7 — argmin {5(7) : 7 G F} . (16) 

To highlight the geometry and minimise the analysis, it is assumed throughout 
that the space of admissible maps F is a compact finite-dimensional submanifold of 
G°°(e,A). 

3.1. A first-order condition for the least-squares solution. To state the first- 
order condition for a solution to (jl6p . one needs some results from differential 
topology. The space C°°{Q,A) may be equipped with the structure of a Frechet 
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manifold and one may consider F as a smooth submanifoldQ There are canonical 
smooth maps ev^ : F — > A defined by 



e X c°°(e,A). 

Ixincl. 




where ev{9,j) — ^[0). 



Proposition 3.1. The first-order condition for ^ eT to he a minimizer of s p5l 
is that d^s e N^(r) C T^*C°°(e, A). That is, 



(dAj • d^ev,)* • (j(y,) - jevi(7)) 



= 



where Xi — j{9i). (17) 



Remark 3.2. We have seen above that compact group orbits are important exam- 
ples of smooth manifolds; and each of these lie within a sphere of constant radius. 
Thus, if A is contained in a sphere of constant radius, proposition 13.11 vields the 
first-order condition 



51 (^AJ ■ d^evi)* -iivi] 



i=l 



Proof of Proposition \3.1[ Let us recall the definition of a tangent vector v G T-yC°° (0, 
One may view v as the derivative at t = of an equivalence class of smooth curves 
7i with 7t=o — 7- For each 6* e O, 7t(0) is a smooth curve on A through 7(0). 
Thus, a tangent vector v S r-yC°°(0,A) is a smooth map v : <d ^ TA such that 
v{9) S T^(e)A for all 9 (differential geometers say that i; is a smooth section of 
7*TA). It follows that if w S T^T, then d-^ev^ • w is a tangent vector in T^(e.)A. 
If V G T-yT is a tangent vector, then the chain rule shows that 



d^s • u = - ^ (j(2/i)> dA,J • d^evi • v) - {jevi{-f), dxj ■ d^ev^ • v) 



1=1 

k 



^('^^'J ■ d7evj)* • {]{y^) -jev,(7)) ,w) 



where one uses the fact that the euclidean structure on E allows one to identify 
T.E and r,*E. This yields (HH). □ 



3.1.1. Least-squares for linear maps. Assume that 0, A are isometrically embedded 
in euclidean spaces Eo,Ei with inclusion maps t,j respectively. 

Definition 3.3. Let F C C°°(8,A). One says that T is a set 0/ linear maps if 
there is a subset A C Hom(Eo,Ei) that map Q into A such that At = jF. 



A Frechet space is a Hausdorff, locally convex vector space, with a complete translationally 
invariant metric |15l . A Frechet manifold is a Hausdorff topological space with an atlas of smooth 
coordinate charts into a Frechet space. 
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In other words, F is a set of linear maps if, for each a G F, there is an yl G A 
such that the foUowing commutes: 




Since j is a smooth embedding (resp. i is a smooth immersion when O spans 
Eo), it is permissible to abuse notation and identify F and A as smooth manifolds. 

Corollary 3.4. Let O C Eq &e a spanning set and let T he a suhmanifold of linear 
maps. If ^ is a least-squares solution to ()16|) . then the matrices 

k k 

^^Y,y^®e[, T^Y.^,®9[ (18) 

i=l i=l 

satisfy 

iy = jT modN^(r), (19) 

where T-yHom(Eo, Ei) = T^F © N-y(F) as in 0, and v G Hom(Eo,Ei),T 6 
Hom(Eo,Eo). 

The condition (fT^ specialises to the least-squares regression formula when A = 
E^, = E*^ and F is the space of linear functions Hom(E'',E^) so that the normal 
space is trivial. In the usual least-squares regression formula, the coefficient vector 
is viewed as a column vector, whereas here one views the coefficient vector as a 
linear function and hence a row vector. The standard formula is recovered by 
tranposing the normal equations ()19p . 

An especially useful application of corollarv l3.4l is when O = A and F C Hom(E) 
is a group. In this case the first-order condition simplifies to 

vi =-iT-i' modNi(F), (20) 

and when F C 0(E), since r is symmetric, 

7^ = modNi(F). (21) 

In other words, 7 is the orthogonal projection onto F of the matrix v. 

Remark 3.5. Kim fTU] looks at the spherical regression problem where one has 
n known design points Xi on S"^ and there are n observations j/i on which are 
distributed about axi where a G SO(E'^) is unknown. For a uniform bayesian prior 

n 

on and the discrepancy function s(a) — ^ \yi — ax^p, Kim shows that the 

1=1 

bayesian estimator is the "least-squares" estimator obtained as follows. Let 

X = -YX' where Y =[yi - ■■ ?/„], AT = [xi • • ■ x„] 

n 

= uav' u,w G SO(E^), cr = diag(o-i, (72, 0-3) (22) 

with (Tl > (72 > fs- 

The least squares estimator is then 
a = uv' . 



If one observes that the singular- value- like decomposition of x in (|22p can be rewrit- 
ten to obtain a polar-like decomposition of x, 

X = uv'{vav') — gp g — uv' G SO(E^),p = vav' G sym(E^) 
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whence 

a = Tr{x) provided 173 > 0. 

One can see that (l?T]) generahses Kim's [TU] formula for the spherical regression 
problem. 

Proof of Corollary \3.4\ Since F acts linearly on Eq, one sees that the normal equa- 
tions (|17p are linear in 7 and simplify to: for all w G T^F 

k k 

E - ^ • ^^'^ • ^•') - E ® - ^ • ^» ® ^"^) (23) 
1=1 1=1 

where the second inner product is the trace inner product as in ([7|) and the inclusion 
map J is dropped to simplify notation. Rearranging (|23p yields (|19p . 

To arrive at (HI]), one notes that when F C Hom(E, E) is a group, then each 
tangent vector v G T!^F is of the form v = ^-7 where ^ G TiF. The normal equations 
(^5)) are then transformed to 

k 

Q = Y,{y^®s[■^' --i-e,®e[-i,i) (24) 

i=l 

for aU f G TiF. Rearranging (HD) yields □ 

3.1.2. Regression with the intrinsic distance. Let dist : A x A ^ R be the riemann- 
ian distance of the riemannian manifold (A, h). One can define the discrepancy 
functional (fT5|) using dist to be 

1 ^ 

1=1 

for 7 G C°°(e,A). 

For each y £ A, the function x ^ dist(j;,x)^ is smooth on the open set of x 
such that there is a unique minimising geodesic from y to x. The set of x on which 
this function is not differentiable is the cut locus of y — a closed, nowhere dense 
subset of A. If X is not in the cut locus of y, then there exists a unique shortest 
tangent vector w = Wy{x) G T^h. such that exp^, w = y and = dist(?/, a:). One 
may write w — log^ y; one knows that w — Wy{x) is a smooth vector field off the 
cut locus of y. 

Proposition 3.6. For x in the complement of the cut locus of y, 

d 

—d\st{y,xf = -2 log^y. 



where T*A and T^A are identified via the metric h. 

d 

Xt ■ Let 



Proof. Let Xt be a smooth curve such that Xt=o ~ x and w = — 

Ct{s) be the unique minimal geodesic from xt to y. It is clear from figure [2] that the 
derivative of ^ dist(y, xt)"^ is (c'(l), v) where c = cq. Since there is a unique shortest 
geodesic joining y to x, reversibility shows that c'(l) = —Wy{x) = — log^, y. □ 



Proposition 3.7. If ^ is a minimiser of s (|25p . i/ie?i either s is differentiable 
at 7 and 

2 

'^'y^ "I E (diev/)* log^(e,) yi 



k 
1=1 



= 0; 
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geodesic c(s) 



Figure 2. The derivative of dist(j/, x)^. 

or there is an I such that ^{Oi) lies in the cut locus ofyi. 

In the first case where s is defined via the extrinsic distance (jlSp . proposition [37T] 
results in a closed form solution for the minimising estimator in many interesting 
cases. The intrinsic distance leads to a system of normal equations which, even in 
simple cases, appear opaque. However, there is additional information which one 
may obtain from these equations. In the first case, since F is a finite-dimensional 
manifold, let us equip it with some riemannian metric. It is well-known that the 
hessian of a smooth function may be defined using riemannian structures, but that 
this hessian at a critical point is independent of those structures. Thus, if one lets 
(j>i{x) = dist{yi,x)^ , then the calculus of second derivatives gives 

1 

Vds|^ = j; ^Vd(l)i{d^evi,djevi) + d^i^g^)(f>i ■ Vdev; 
1=1 

where Vdc/); is the hessian of etc.. One knows that yd4>i{v,w) is the second 
variation of the energy functional E[c] — |c'(s)pds along the Jacobi fields de- 
termined by v,w 6 T^(ej)A and the minimising geodesic c from yi to jiOi). If it 
is assumed that these do not lie in the cut locus of the other, then this second 
variation is necessarily positive. Thus, the only way for Vds to not be positive 
definite is for one or more of the forms dc/); ■ Vd ev; to be negative definite along 
some subspace. This cannot happen if Vdev; vanishes for all I. 

Proposition 3.8. // there is a riemannian structure on F such that Vdev; = 
for all I, and 7 G F is a smooth critical point of s, then j is a local minimum. 

Example 3.9. Fet $ = A be the unit sphere in and let F = S0(3) be the 
group of orientation-preserving isometrics of S^. In this case, the distance function 
is the angle between vectors 

dist(y,a;) — arccos(a) 
while the inverse to the exponential function is 

log^ y = {x Ay) Ax where cos a = {y, x) , (26) 

sma 

One computes that for each 7 £ SO (3) and ^ €E so(3) 
2 



and y ^ —x in ((2d|) . 

d_ 

dt 
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SO the puUback of d^s to TiS0(3) is 



2 ^ 

1=1 



so(3) 



Thus, d^s vanishes iff r is a symmetric matrix. But modulo Ni(S0(3)) — sym(E^) 
one has 

T = 7V, where 1^ ^ - -^^yi ® 0[ (27) 

k ^-^ sm a/ 

so one concludes that the first-order condition is that 7 is the orthogonal projection 
of V onto SO (3). This is similar to the least-squares condition — except that 
in (j27p the matrix is a function of 7 through the angles a/. However, if one writes 
yi — 7'^/ + en, and expands the matrix v in the small parameter e, then one has 
V — evi + O(e^) and vi formally is the same as v in (I18|) . In other words, the 
intrinsic-distance regressor is a perturbation of the least-squares regressor. 

In this example, evi(7) = 7(6';) so the map ev; : S0(3) S'^ is the canonical 
projection map. In particular, this map has vanishing hessian - Vdev; = - 
so proposition 13.81 implies that a smooth solution to the first-order condition r = 
mod sym(E^) is a local minimiser of s. Moreover, one knows 

Lemma 3.10. Let 01 be the set of "f at which s is not differentiable. Then 91 is a 
union of translates of subgroups isomorphic to SO (2). 

If ^ is a local minimum point of s, then s is differentiable at 7. In particular, 
the regression estimator 7 satisfies the property that 

7^ = modNi(SO(3)) 

where v is defined in (j27p . 

Proof. Since dist(?/, x) is differentiable in x on the set S'^ — {—J/}, s is differentiable 
at 7 iff 7(6*;) ^ ~yi for all I. Thus, if s is not differentiable at 7, then there is a yi^^ 
such that -fiOig) = —6i„ = yi„. If ji is some solution to j{Oi) ~ —yi, then the set of 
all solutions to the latter is 7/ ■ stab(0/), which is a translate of a group isomorphic 
to S0(2). Thus m = Uti7, • stab(6'/). 

Let 7 be a local minimum point of s. Assume that 7 e 01. Without loss of 
generality, it can be assumed that there is an /q > such that 7(0;) = ~yi (resp. 
li^i) ^ -yi) for I < Iq (resp. I > Iq)- 

Let So (resp. Si) be the part of s contributed for / < lo (resp. I > Iq). Then, 

So (7) = 'oTT^, ^731 = 0. 

Moreover, ii y,x G S*^, then since y,—y,x is a degenerate triangle, dist(y,a;) = 
dist(y, —y) — dist(— y, x) = tt — dist(— y, x). Therefore, one knows that 



so(7) = E(^-dist(-y,,7((?,)))' 

l<lo 

= W - Stt ^ dist(7(00, 7(^0) + 0(|7 - 71'), 

l<lo 

si(7) = si(7) + 0(|7-7p). 

Since the orbit map ev; : S0(3) — > S*^ is a riemannian submersion, there are 7 such 
that, for a fixed /, dist(7(6'/), 7(0;)) — dist(7,7). This implies that Sq decreases 
along 7 more than Si increases. But 7 is a local minimum. Absurd. Therefore, if 7 
is a local minimum, then s is differentiable at 7. □ 
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Let Rj{s) be the counterclockwise rotation of E'^ by s radians in the plane 
orthogonal to the j-th standard basis vector. Elements of SO (3) may be parame- 
terised in terms of '3-1-3' Euler angles: 7 — i?3(a)i?i(6)i?3(c) where a,c G [0, 27r] 
and b G [0,7r] p]. In figure [3 one has an empirical distribution of the regressor 
7 = 7(y) in Euler angles. For k = 100 design points 9i, drawn from the uniform 
distribution on S"^, yi = uil\ui\ where ui = + a ■ ei and e; is an i.i.d. gaussian 
in E^. N = 1000 draws are made and the first-order condition is numerically 
solved for = 0.1 to 0.9 in increments of 0.1. All computations are performed in 
Octave [5J. The starting point for the numerical solution of ([27|) is provided by 
the orthogonal projection of Vi ® onto SO (3). 

Figure [H shows the histograms of the normalised empirical distributions of the 
Euler angles of the regressor 7 = 7(y) and reports the Kolmogorov-Smirnov p- value 
for normality. The normalised Euler angles are of the form ^ — C~^x, where x is 
the regressor's Euler angle, and the sample covariance matrix of x is CC. 




Figure 3. The empirical distribution of the regressor 7 — j{y). 
See text for further information. 



3.2. A bayesian approach. Let y — (j/i, . . . , yk) G $ = A'^ and let A(7) d7 be a 
bayesian prior on F (F is only assumed to be a smooth submanifold of C°°(8, A) 




Figure 4. The histogram of the normahsed deviations from the 
mean of the regressor 7 = 7(y). The p- value for the Kolmogorov- 
Smirnov test of normahty is reported. 



at this point). Let £:rxr^Rbea loss function as defined in the introduction 
to section [3] and assume that /(y|7) = Hi fiUillif^i)) is the conditional density of 
y. The bayesian risk of 7 G F is then 

7^(7)=/ / £(7,7)/(y|7)A(7)dyd7. 

One can define quantities 

M(y) = / /(y|7) A(7) d7, A(7|y) - ^^^'^^ ^^^^ 



7^(7|y) = / ^(7,7)A(7|y)d7 



M(y) 



to arrive at 

^(7) = / 7^(7|y)A*(y)dy, 

where dependence on the design points Oi has been omitted for notational compact- 
ness. Therefore, one can choose a bayesian estimator 7 by minimising the posterior 
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risk 

g{y) = argmin {n{j\y) : 7 e T} , g : $ -> T. (28) 

Since T is assumed to be compact, the bayesian estimator g is defined for all y and 
measurable. If, in addition, i is smooth, then TZ{'y\y) is smooth in 7. 

The following notation is useful in formulating the first-order necessary condition 
to determine ^(y). Let £ = £(j, 7) be a smooth function that is defined for all pairs 
of maps 7,7 in C°°(8, A). One may view ^ as a function of 7 depending on the 
parameter 7. Let 

di 

— €T;c^{e,A) (29) 

be the 1-form defined by fixing 7 and taking the derivative with respect to 7. In 
this case, the map 7 is a smooth map from C°°(8,A) to the vector space 

TTC^{e,A). 

Proposition 3.11. Assume that the loss function £ is a smooth function onC°°(0, A)x 
C°°(e,A). Then, 

d f d£ 

-niW) - / TTTA(7|y)d7 



— 7^(7|y) lies in N^(r) C ^C°°{Q,K). 



dl J^er 97 

If ^ = giy) is a bayesian estimator satisfying (|28p . then 
d_ 
dj ' 

The proof of this proposition is straightforward. One observes that the integral 
on the right-hand side is well defined since, by ((29|) . one is integrating a smooth 
function which takes values in a single vector space. 

3.2.1. The squared-norm loss function. As (A, h) is assumed to be isometrically 
embedded in (E, cr) as in HH), one may define an metric on C°°{Q, A) by means 
of the ambient cuclidcan structure 



|7|'= / hieWde V7ec°°(e,A). (30) 

A natural squared-norm loss function is then 

^(7,7) = I7-7P V7,7eC°-(e,A). (31) 

(The requisite 'j's in ([50H?T|) are suppressed for simplicity). 

Proposition 3.12. Let F C C°°(6, A) be a smooth submanifold and the loss func- 
tion £ be defined as in pi|) . If ^ ~ 9{y) o bayesian estimator as in ()28p . then 

n:= f J7A(7|y)d7 safe/?e.s J7 e N^(r) C r|C°°(e, E). (32) 

J-ter 

Remark 3.13. One considers J7 to be a form in T?C°°(e, E) and not in r|C°°(0, A) 
in equation ((5^ due to the natural embedding A C E. 

Proof. In this case, the smoothness of TZ{^\y) in 7 is immediate from the loss 
function. One computes that 

d£ I 

^ = / d^i9)f dm - nm de G T?c°°(e, e), 

C7 Jeee 

whence 



|^7^(7|y)-/ d0d^(,)jN / d7A(7|y) (j7(0)-j7(0))}>. (33) 
0^7 Jeee Ufer 
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Proposition l3.11l shows that the left-hand side of ([55]) hes in N^(r) if 7 is a bayesian 
estimator. Define ^ e T|C°°(e,E) by 



One observes that the right-hand side of ([35]) vanishes on TjT if ^ vanishes on T^F, 
and ^ vanishes on T^T if ^ vanishes, i.e., if 



J7((?)=/ d7A(7|y)j7(0) modN^(r)e, 

where N^(r)6/ is the subspace of T^^^jE generated by elements of N^(r) (which are 
sections of 7*T*E) evaluated at 9. Therefore, one obtains 



J7= / d7A(7|y)j7 modN^(r), 
J-yer 

which proves the proposition. □ 

3.2.2. Estimation of Linear Maps. Assume that both and A are isometrically 
embedded in euclidean spaces Eo and Ei respectively. Let F C Hom(Eo,Ei) be a 
manifold of linear maps that maps 8 to A. Inspection of the right-hand side of (f32|) 
shows that 7 is itself the restriction of a linear map to 8, so the bayesian estimator 
7 can be described using only the geometry of Hom(Eo, Ei). 

Define a positive semi-definite quadratic form on Hom(Eo, Ei) by 

{{a, 13)) = Tr(a'-/3-r) Va, /3 G Hom(Eo, Ei), 

where 

j(f?)®j(f?)'d0eHom(Eo,Eo). 



9Ge 

The first-order condition ([32|) implies that the bayesian estimator 7 satisfies 

7-r = 7-r modN^(r), 
where N^(r) is the normal space to T^T in Hom(Eo,Ei). 

Proposition 3.14. Let T C Hom(Eo, Ei) be a submanifold and the loss function £ 
be defined as in (j30p . Suppose that 8 spans Eg. Lfj~ g{y) is a bayesian estimator 
as in (|28p. then the linear transformation 

7 := / 7A(7|y)d7 satisfies j = j mod Nj{T) ■ . (34) 

Proof. The only thing that remains to prove is that r is non-degenerate if 8 spans 
Eq. If u e Hom(Eo,Ei) and 

= {{v,v))=l \v-e\^de, then8Ckerw. 

Therefore, Eq = span 8 C kerw, so w = 0. □ 

Remark 3.15. Let 8 be the unit sphere in Eq. One computes that r is a scalar 
multiple of the identity matrix, whence condition ([34| is simply that 7 is the or- 
thogonal projection onto F of 7. 

Let £ be the loss function on F induced by the inner product on Hom(Eo, Ei): 

^(7,7) = I7-7P = Tr((7 - 7)'(7 - 7)) V7,7 £ F. 

When Eq — Ei and F C 0(E), the loss function simplifies to 2s — 2 Tr (7'7) where 
s = dimE. 
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3.2.3. The intrinsic distance loss function. Let 7,7 £ C°°(0,A) be smooth maps 
between the riemannian manifolds (0,g) and (A, h). For each 9 € Q, let w{9) G 
r^(e)A be a tangent vector to a shortest geodesic c(s) — exp.^(g)(s • w{0)) such that 
c{l)=j{9). 

If j{9) does not lie in the cut locus of 7(6*), the tangent vector w{9) is uniquely 
defined and one may unambiguously write w{9) — log^j-g-, (7 It is apparent 
that there are measurable maps 9 <—>■ w{9)^ and this map is smooth off the above- 
mentioned set of "bad" points. In particular, if the graph of 7 lies in a tubular 
neighbourhood of the graph of 7, then the map w is a uniquely defined, smooth 
map. 

Let £ — C be the set of points 9 such that ^{9) lies in the cut locus 
of 7(6'). If the measure of £ is zero, then compactness of A implies that w is 
square-integrable. Therefore, one may define a one-form uj = £ T^C°°{Q,A) 

by 



{lu,v) = / d9-h{w{9),v{9))^(^e)- 

9ee 



for each v G r^C°°(e,A). 
Proposition 3.16. Let 



£(7,7) = i / d0-dist(7(0),7(^)f, 



where dist is the riemannian distance function o/(A,h). If has measure zero 
d£ 

and A is compact, then — — exists at (7,7) and equals 
07 

de _ 

Proof. Let 74 be a curve of smooth maps such that 74=0 = 7 a-nd v = — 74. 

4=0 

Let 6* £ 8 — £ be fixed, and let 04(5) be the minimal geodesic from 'y{9) to "ft{9). It 
is clear from figure[2]that the derivative of ^ dist(74(0), 7(0))^ is (c'(l), v{9)) where 
c = cq. From the above discussion, it is clear that c'(l) = —w{9). 

d£ 

If £^ -y has zero measure, then the discussion above shows that — exists at (7, 7) 

07 

and equals — ^7,7- □ 



Let £r C r be the set of maps 7 such that / / d6'd7 = 0. By proposition 

13.161 if 7 e £r, then — exists for almost all (7,7) G F x {7}. The following 
07 

theorem is a consequence of Proposition 13.161 and Fubini's theorem. 

Theorem 3.1. (l)//7G£r o,nd 7 = ^(y) is a bayesian estimator as in 
Proposition \3.11[ then 7 satisfies 

d7A(7|y) (\og^^e)hiO)),y{0)) =0 Vz; G T7F. 

cular, if 'J Cz fE-r satisfies the equation 

d7 A(7|y) log^(e)(7(^)) = mod N7(F), 

/76r 

for a. a. 9 £ Q, then j is a candidate for a bayesian estimator as in Propo- 
sition \3.1l[ 
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Example 3.17. Let us examine an application of both parts of Theorem 13. II Let 
= A = 5^ C E'^ be the unit sphere and let F — S0(3) be the group of orientation- 
preserving isometrics of S*^ with normalised Haar measure d7. 

(1) Since F is a transitive group of isometrics, the logarithm function is F- 
equivariant, so part (1) of 13. II implies that 



w^i9\y) = d7A(77|y) log0(76l) 

must integrate to zero on S'^ against any vector field of the form v(6) — £,-9, 
^ e so(3). One uses the fact that loge(76l) = ^ x (76I - (761,61} • 0) {c.f. 
to compute that 

d6id7A(77|y) {lO,^e) , cosa = (761,6') 
J7eso(3) sma 

= -1 X Tr(T(7).0, 
where 

^(7) = 3 / / d6' d7 A(77|y) jOO' {9' = transpose of 9) 

is defined analogous to ([32| . Since d9 9®9' = i/, if the weight a j sin a 
were identically 1, then 7 = 7''"(7) would coincide with that defined in p2p . 
It follows that if 7 equals the bayesian estimator (7(y), then r(7) must be 
symmetric. In other words, 7 is the orthogonal projection of 7(7) = 7 • t(7) 
onto F, similar to (P7|) . 

(2) On the other hand, let us investigate condition (2) of Theorem 13.11 Let 
X : (S*^)*^ S0(3) be an equivariant map and let the joint conditional 
density of y be /(y|7) = 1 + cTr(7' • x(y)). Assume that the mean of 
7' with respect to the bayesian prior A(7) is zero. The posterior density 
A(7|y) is therefore equal to /(y|7). 

To fix ideas, one may take x(y) = 7r(^jLj^ yi ® 9'i), where tt : gl(3) 
S0(3) is the orthogonal projection, and A(7) = 1 for all 7. 

Let e G 5^ be a given point. Since F acts transitively, one can write 
9 = a ■ e for some a G F. One therefore finds that the vanishing of Wj{9\y) 
is equivalent to the vanishing of 



(35) 



X 



da d7A(7Q!7a |y) logg(7 • e) . 

If one introduces Euler angles on SO (3) relative to an orthonormal frame 
{61,62,63 — e}, then one can write = R^{aj)Ri(hj)R^(cj) where i?i(s) 
is a rotation in the plane orthogonal to e^ counterclockwise by angle s. The 
vanishing of ()35p is equivalent to the vanishing of the multi-integral 

doi da2 dci dc2 d6i d62 (36) 



[0,27r]4x[0,7r]2 647r4 

X sin(6i) sin(62) 61^2 cos(ai - 02) A(7Q!7ia~"^ |y) A(7a72a~"^|y) 

for every a G SO (3). 

Let the special orthogonal matrix Q!~^7'xq; be factorised as R^{x)Ri {y)R^{z) 
in terms of Euler angles. Maxima computes the integral ((5^ to be tt'^c^ sin(?/)^/256 
[12]. Therefore, the integral vanishes for all a iff 7'x = /, i.e. 7 = x. 
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